67 research outputs found

    Inter-Coder Agreement for Computational Linguistics

    Get PDF
    This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder. </jats:p

    Nominalization and Alternations in Biomedical Language

    Get PDF
    Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica

    Coordination of Word Parts is Interpreted at Surface Level *

    No full text
    In this paper I argue that coordination of parts of words, as in (1) below, has to be interpreted at the level of the visible string; as a consequence, the semantics must assign separate meanings to the word parts ortho, perio, and dontists (an orthodontist is a specialist in straightening teeth; a periodontist specializes in gum disease). (1) ortho and periodontists The evidence comes from the interpretation of plural morphology. Specifically, the NP ortho and periodontists is not synonymous with orthodontists and periodontists. Suppose that Bill is an orthodontist and Martha is a periodontist; then sentence (2) below has a reading on which it is true, whereas sentence (3) does not have a true reading. (2) Bill and Martha are ortho and periodontists. (3) #Bill and Martha are orthodontists and periodontists. The relevant structure for the interpretation of ortho and periodontists must therefore be different from that of orthodontists and periodontists. Notice that the contrast between (2) and (3) is similar to the contrast between (4) and (5) below: only the former receives a coherent reading. (4) Konishki and Takanohana are heavy and light sumo wrestlers. (5) #Konishki and Takanohana are heavy sumo wrestlers and light sumo wrestlers. * Earlier versions of this paper were presented at TLS-2001 at the University of Texas and to the Rutgers semantics group; I thank the audiences for their comments and insights. I owe particular thanks to Rajes
    • …
    corecore